扩散MRI拖拉术是一种先进的成像技术,可实现大脑白质连接的体内映射。白质拟层将拖拉机分类为簇或解剖学上有意义的区域。它可以量化和可视化全脑拖拉学。当前,大多数拟层方法都集中在深白质(DWM)上,而由于其复杂性,更少的方法解决了浅表白质(SWM)。我们提出了一种新型的两阶段深度学习的框架,即浅表白质分析(SUPWMA​​),该框架对全脑拖拉机的198个SWM簇进行了有效且一致的分析。一个基于点云的网络适应了我们的SWM分析任务,并且监督的对比度学习可以在SWM的合理流线和离群值之间进行更多的歧视性表示。我们在大规模拖拉机数据集上训练模型,包括来自标签的SWM簇和解剖学上难以置信的流线样本的简化样品,我们对六个不同年龄和健康状况的独立获取的数据集进行测试(包括新生儿和具有空间型脑肿瘤的患者) )。与几种最先进的方法相比,SupWMA在所有数据集上获得了高度一致,准确的SWM分析结果,在整个健康和疾病的寿命中都良好的概括。另外,SUPWMA​​的计算速度比其他方法快得多。
translated by 谷歌翻译
白质图微观结构已显示出影响认知表现的神经心理学评分。但是,尚未尝试从白质图数据中预测这些分数。在本文中,我们提出了一个基于深度学习的框架,用于使用从扩散磁共振成像(DMRI)片段估计的微观结构测量结果进行神经心理学评分的预测,该框架的重点是基于接受语言的关键纤维纤维小道的接受性词汇评估任务的性能弓形筋膜(AF)。我们直接利用来自纤维道中所有点的信息,而无需按照传统上沿着光纤的平均数据进行扩散MRI Tractometry方法所要求的。具体而言,我们将AF表示为点云,每个点都有微观结构测量,从而可以采用基于点的神经网络。我们通过拟议的配对 - 塞亚姆损失来改善预测性能,该损失利用了有关连续神经心理学评分之间差异的信息。最后,我们提出了一种关键区域定位(CRL)算法来定位包含对预测结果有很大贡献的点的信息解剖区域。我们的方法对来自人类Connectome项目数据集的806名受试者的数据进行了评估。结果表明,与基线方法相比,神经心理评分的预测表现优异。我们发现,AF中的关键区域在受试者之间非常一致,额叶皮质区域的强大贡献最多(即,尾部中间额叶,pars opercularis和pars triangularis)与关键区域有着强烈的影响用于语言过程。
translated by 谷歌翻译
扩散MRI拖拉术是一种用于定量映射大脑结构连接性的高级成像技术。全脑拖拉机(WBT)数据包含数十万个单独的纤维流线(估计的大脑连接),并且通常会对这些数据进行分类,以创建用于数据分析应用(例如疾病分类)的紧凑表示形式。在本文中,我们提出了一种新颖的无拟合WBT分析框架Tractoformer,该框架在单个纤维流线的水平上利用拖拉术信息,并提供了使用变压器注意机制来解释结果的自然机制。 Tractoformer包括两个主要贡献。首先,我们提出了一个新颖而简单的2D图像表示WBT,Tractobedding,以编码3D纤维空间关系以及可以从单个纤维(例如FA或MD)计算的任何感兴趣的特征。其次,我们设计了一个基于视觉变压器(VIT)的网络,其中包括:1)数据增强以克服小数据集上过度适应模型的数据,2)识别判别纤维以解释结果,3)合奏学习以从不同大脑区域。在合成数据实验中,TractoFormer成功地识别了具有模拟组差异的判别纤维。在比较几种方法的疾病分类实验中,tractoformer在分类精神分裂症与对照方面达到了最高的精度。在左半球额叶和顶浅的白质区域中鉴定出判别性纤维,这些区域以前已被证明在精神分裂症患者中受到影响。
translated by 谷歌翻译
白质纤维聚类(WMFC)是白质细胞的重要策略,可以对健康和疾病中的白质连接进行定量分析。 WMFC通常以无监督的方式进行,而无需标记地面真相数据。尽管广泛使用的WMFC方法使用经典的机器学习技术显示出良好的性能,但深度学习的最新进展揭示了朝着快速有效的WMFC方向发展。在这项工作中,我们为WMFC,深纤维聚类(DFC)提出了一个新颖的深度学习框架,该框架解决了无监督的聚类问题,作为具有特定领域的借口任务,以预测成对的光纤距离。这使纤维表示能够在WMFC中学习已知的挑战,即聚类的敏感性对沿纤维的点排序的敏感性。我们设计了一种新颖的网络体系结构,该网络体系结构代表输入纤维作为点云,并允许从灰质拟合中纳入其他输入信息来源。因此,DFC利用有关白质纤维几何形状和灰质解剖结构的组合信息来改善纤维簇的解剖相干性。此外,DFC通过拒绝簇分配概率低的纤维来以自然方式进行异常去除。我们评估了三个独立获取的队列的DFC,包括来自220名性别,年龄(年轻和老年人)的220名个人的数据,以及不同的健康状况(健康对照和多种神经精神疾病)。我们将DFC与几种最先进的WMFC算法进行比较。实验结果表明,DFC在集群紧凑,泛化能力,解剖相干性和计算效率方面的表现出色。
translated by 谷歌翻译
未经监督的人重新识别(Reid)是一个具有挑战性的任务,没有数据注释,以指导歧视性学习。现有方法通过群集提取的嵌入式来尝试解决此问题以生成伪标签。然而,大多数方法忽略了摄像机样式方差引起的类内间隙,并且一些方法是相对复杂和间接的,尽管它们试图解决相机样式对特征分布的负面影响。为了解决这个问题,我们提出了一种相机感知的风格分离和对比学习方法(CA-Ureid),它直接将相机样式与设计的相机感知的注意模块直接分离在功能空间中。它可以将学习功能明确地将学习功能分为特定于相机和相机不可知的部件,从而降低了不同摄像机的影响。此外,为了进一步缩小相机的差距,我们设计了一个摄像机感知对比中心损失,以了解每个身份的更多歧视性嵌入。广泛的实验证明了我们对无监督者Reid任务的最先进方法的方法的优越性。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译
Improving the visual quality of the given degraded observation by correcting exposure level is a fundamental task in the computer vision community. Existing works commonly lack adaptability towards unknown scenes because of the data-driven patterns (deep networks) and limited regularization (traditional optimization), and they usually need time-consuming inference. These two points heavily limit their practicability. In this paper, we establish a Practical Exposure Corrector (PEC) that assembles the characteristics of efficiency and performance. To be concrete, we rethink the exposure correction to provide a linear solution with exposure-sensitive compensation. Around generating the compensation, we introduce an exposure adversarial function as the key engine to fully extract valuable information from the observation. By applying the defined function, we construct a segmented shrinkage iterative scheme to generate the desired compensation. Its shrinkage nature supplies powerful support for algorithmic stability and robustness. Extensive experimental evaluations fully reveal the superiority of our proposed PEC. The code is available at https://rsliu.tech/PEC.
translated by 谷歌翻译